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Abstract 

Background: Protura is a group of tiny, primarily wingless hexapods living in soil habitats. Presently about 800 valid species 
are known. Diagnostic characters are very inconspicuous and difficult to recognize. Therefore taxonomic work constitutes 
an extraordinary challenge which requires special skills and experience. Aim of the present pilot project was to examine if 
DNA barcoding can be a useful additional approach for delimiting and determining proturan species. 

Methodology and Principal Findings: The study was performed on 103 proturan specimens, collected primarily in Austria, 
with additional samples from China and Japan. The animals were examined with two markers, the DNA barcoding region of 
the mitochondrial COI gene and a fragment of the nuclear 28S rDNA (Divergent Domain 2 and 3). Due to the minuteness of 
Protura a modified non-destructive DNA-extraction method was used which enables subsequent species determination. 
Both markers separated the examined proturans into highly congruent well supported clusters. Species determination was 
performed without knowledge of the results of the molecular analyses. The investigated specimens comprise a total of 16 
species belonging to 8 genera. Remarkably, morphological determination in all species exactly mirrors molecular clusters. 
The investigation revealed unusually huge genetic COI distances among the investigated proturans, both maximal 
intraspecific distances (0-21.3%), as well as maximal congeneric interspecifical distances (up to 44.7%). 

Conclusions: The study clearly demonstrates that the tricky morphological taxonomy in Protura has a solid biological 
background and that accurate species delimitation is possible using both markers, COI and 28S rDNA. The fact that both 
molecular and morphological analyses can be performed on the same individual will be of great importance for the 
description of new species and offers a valuable new tool for biological and ecological studies, in which proturans have 
generally remained undetermined at species level. 
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Introduction 

A great part of the megadiverse arthropods is soil dwelling [1]. 
The stronger the association to soil the more likely morphological 
adaptations are to be present, such as elongated or cylindrical 
body shapes and frequently shortened, reduced or absent body 
appendages, such as antennae, legs or cerci [2]. In extreme cases 
this leads to a paucity of morphological characters which impedes 
species delineation and determination. For many of these taxa only 
few specialized and experienced taxonomists are able to accurately 
identify species [3]. 

In our study we focused on Protura, a poorly known group of 
primarily wingless hexapods. Due to their small body size (body 
length between 0.5-2.5 mm) and their hidden life style in the 
euedaphic soil region our knowledge of their biology and ecology 
still remains very fragmented (for review see [4]). Proturans are 



one of the most peculiar groups of hexapods with several unique 
characters. The most distinguishing character pertains to their 
front legs, which are held forward and presumably functionally 
compensate the lack of antenna and its sensory features [5] . The 
morphology of Protura is very homogeneous and determination is 
utterly demanding. All important diagnostic characters are difficult 
to distinguish, such as are the shape of sensilla and their location 
on the foretarsi and body, the presence or absence of the tracheal 
system, the structure of abdominal legs, the shape of the maxillary 
gland, the maxillary and labial palps, the "striate band" on the 
abdominal segment VIII, the squama genitalis and the porotaxic 
pattern [6-10]. Although Protura is not known from the fossil 
record, there is general agreement that they represent one of the 
earliest branches of the hexapod tree, which may date back to 400 
million years [1 1]. Furthermore, their phylogenetic position within 
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hexapods is still a matter of controversy (for review see [4], [12- 
13]). 

In the last decades, molecular approaches have attracted 
attention attempting to facilitate species identification for a 
broader scientific community including non-experts [14—15], (for 
review see [3]). The most promising and cost-effective approach 
was proposed by [16-17], who introduced the DNA barcoding 
method, which enables species characterization based on a short 
DNA region of a universal standardized marker. For animals this 
marker is represented by a 648 bp long fragment of the 
mitochondrial cytochrome c oxidase subunit 1 (COI) gene. 
Additionally, DNA barcoding can support species delimitation of 
new or cryptic species. Therefore it was suggested to include the 
COI barcode in any new species description along with knowledge 
about morphology, geographical distribution, and other ecological 
and biological data [18]. 

Since the introduction of DNA barcoding, many studies have 
ensued and numerous challenges emerged, such as the "barcoding 
gap", possible overlaps between intra- and interspecific distances, 
limitations of a single-gene approach, and alternative or comple- 
ment markers (for review see [3]). 

In the present study, we aimed to establish DNA barcoding for 
Protura. Until now, the only complete mitochondrial genome was 
published by [19] . Additionally, few partial sequences are available 
for cytb, 3' end of COI, COII and 12S rDNA [19-21]. The first 
complete COI barcodes were published as part of the description 
of the new species Acerentulus charrieri [22], Yamatentomon guoi [23], 
and Hesperentomon yangi [24]. 

Due to the minuteness of proturans, species can only be 
determined unambiguously after a clearing treatment in which all 
tissues are removed and subsequent slide mounting of the 
specimen. Therefore, we adapted a non-destructive DNA-extrac- 
tion method for Protura to enable the following species 
determination on basis of the cuticular skeleton [25]. In our 
barcoding approach we used two molecular markers, the DNA 
barcoding region of COI and a fragment of the nuclear 28S 
rDNA, including the Divergent Domains 2 and 3, to investigate 
whether the species diversity within Protura, as recorded by 
traditional taxonomy, is reflected in the molecular data. 

Materials and Methods 

Ethics statement 

All species used in this study are neither CITES-species nor 
endangered species according to regional Red List (neither Red 
List of Austria, nor Red List of the federal states, where the 
localities lie). As such no special sampling permission is necessary 
for taking soil samples. 

Generally sampling permissions lie in the area of authority of 
the different federal states of Austria - in our case Vienna, Lower 
Austria and Carinthia. For Lower Austria our sampling permission 
number is RU5-BE-939/001-2013. 

Soil sampling was performed at the following locations: 

Study sites and sampling 

Soil samples were taken from three localities in eastern and 
southern Austria (i) Leopoldsberg (Vienna, N: 48°16'36.36" E: 
16°21'00.46") (the worldwide biodiversity hotspot of Protura [26]) 
collected at 09.09.2009, 13.03.2012, and 23.04.2012, (ii) Eichkogel 
(Lower Austria, N: 48°03'45.03" E: 16°17'32.26") collected at 
26.09.2009, and (iii) Twimberger Graben (Carinthia, N: 
46°53'54.02" E: 14°50'54.27") collected at 24.04.2011 and 
01.1 1.201 1. Specimens were expelled from soil samples by Berlese 
funnels and fixed in 100% EtOH. To enlarge the phylogenetic 



coverage of the taxon sampling, we additionally included samples 
from China (Sinentomon erythranum) and Japan (Bacukntulus densus, 
Filientomon takanawanum) (species list Table SI). 

Determination 

Species were identified following the determination key for 
European Protura [7], complemented by details given in [27-29]. 

DNA extraction, PCR amplification and sequencing 

Whole genomic DNA was extracted by a non-destructive 
extraction method [25]. The procedure follows the standard 
protocol of the DNA-extraction kit (Blood & Tissue, Qiagen) but 
with an increased incubation time of 24 hours. The final volume 
of elution buffer was 60 After DNA-extraction the remaining 
cuticle was transferred to 96 % EtOH, washed in Marc Andre I, 
and whole-mounted in Marc Andre II. In total, 171 individuals 
were successfully extracted of which 99 were finally sequenced 
(species list Table SI). 

Two thermocycling profiles were used to amplify fragments of 
COI and 28S rDNA, differing only in the annealing temperature: 
pre-denaturation at 94°C for 3 min, 35 cycles of 1 min at 94°C, 
1 min at 48°C (COI)/ 1 min at 45°C (28S) and 1 min at 72°C, 
final extension step for 5 min at 72°C. All PCR mixes had a total 
volume of 25 and contained 16.4 ul ddH z O, 2 ul template, 
2.5 ul primer [10 \lM; VbC Biotech], 2.5 jxl dNTPs [2 mM each; 
Fermentas], 2.5 ul PCR Buffer [lOx containing 20 mM MgCl 2 ; 
Fermentas DreamTaq], and 1 ul Polymerase [5u/ |J,1; Fermentas 
DreamTaq]. In some cases the addition of 0,5 |J.l MgCl 2 [25 mM; 
Fermentas DreamTaq] yielded better results. PCR products were 
purified using QIAquick PCR Purification kit (Qiagen) and eluted 
in 35 |il AE buffer. COI and 28S fragments were sequenced in 
both directions by VbC Biotech Service GmbH (www.vbc-biotech. 
at/). 

A set of different primers was necessary to successfully amplify 
and sequence fragments of the COI and the 28S rDNA, 
depending mostly on the genus of the given specimen (Table 1). 
Regarding the 28S rDNA, the regions 2 and 3 were amplified and 
sequenced in two overlapping fragments employing three primers 
specified in [30] and then completed by a new forward primer 
slightly moved to the 3' end. In some specimens an additionally 
designed internal reverse primer was necessary to sequence the 
whole 28S fragment. COI sequencing reads were assembled and 
checked by eye for reading frame errors in Bioedit Sequence 
Alignment Editor [31]. 28S reads were assembled with SeqMan 
(Lasergene v.8, DNASTAR) and checked by eye. 

Sequences from GenBank (NCBI) of the complete mitochon- 
drial genome of Sinentomon erythranum (accession number: 
NC015982, accessible since X.2011) and additionally the COI 
DNA barcode of Acerentulus charrieri (accession number: JQ41 1217, 
accessible since III.2012) were downloaded. The sequences of COI 
determined in this study are deposited at BOLD under the project 
name PROTAT. The material is deposited in the collection of the 
University of Vienna and in the collection of the State Museum of 
Natural History of the National Academy of Sciences of Ukraine, 
L'viv (SMNH). 

Alignment and data analysis 

Alignment and tree reconstruction were performed for each 
gene separately. The COI sequences were aligned manually in 
BioEdit Sequence Alignment Editor [31]. The 28S rDNA 
sequences were aligned using default settings of the program 
MUSCLE [32], as implemented in Mega v. 5.05 [33]. To avoid 
errors due to relatively high length variation, subsequently possibly 
misaligned positions were identified using Aliscore v2.0 [34]. 
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Table 1. List of primer pairs and starting positions within the selected markers used in the present study. 



Locus 


Primer name 


Sequences (5 -3 ) 


Position (bp) 


Reference 


COI 


LCO1490 


ggtcaacaaatcataaagatattgg 


23 


[29] 




LepFI 


attcaaccaatcataaagatattgg 


23 


[17] 




CamF1 


ctcractaaccataargatattgg 


26 


present study 




ProtF2 


acgaaccatagggatatcgg 


28 


present study 




CamRI 


taaacttcdggrtgdccaaaaaatc 


707 


present study 




ProtR2 


gagcycatcatatrtttac 


863 


present study 




DiplRI 


gcaataattatdgtdgctgc 


919 


present study 


28S rDNA 


D1a 


cccgcgtaatttaagcatat 


64 


[24] 




D2a 


gatagcgaacaagtacc 


416 


[24] 




D2aprot 


gtaccgcgagggaaagttg 


428 


present study 




D3bint363* 


gagcaccgccgaactgtg 


545 


present study 




D3a 


gacccgtcttgaaacacgga 


950 


[24] 




D3arev* 


tccgtgtttcaagacgggac 


1092 


[24] 




D3arev_prot* 


ctccttggtccgtgtttc 


1102 


present study 




D3b 


tccggaaggaaccagctacta 


1435 


[24] 



Reference sequence Sinentomon erythranum (NCBI accession number: NC015982), * indicates internal primers additionally used in some specimens. 
doi:1 0.1 371 /joumal.pone.0090653.t001 



Aliscore identifies sections in multiple sequence alignments which 
cannot be distinguished from random similarity. Gaps were 
treated as ambiguities and the maximum number of possible 
pairwise comparisons was analysed. Identified random similar 
sections were excluded with Alicutv2.2 (http://www.zfmk.de). 
Neighbor-Joining (NJ) trees based on K2P distances of COI and 
28S rDNA were compiled in Mega v. 5.05 [29], and compared to 
check for congruence of retrieved clusters. The reliability of both 
trees was assessed with 5000 bootstrap replicates. 

COI- and 28S rDNA-trees were edited in Corel Draw Graphics 
Suite X3 (www.coreldraw.softonic.de/). 

To compare intraspecific and interspecific distances and to 
search for a DNA barcoding gap in Protura, a number of distance 
measures were analyzed with Speciesldentifier 1.7.8 [35]. 

To indirectly test for saturation, the ratio of transitions to 
transversions was evaluated using the software DAMBE [36] and 
visualized in a saturation plot. 

Results 

Primer-establishing for COI and 28S rDNA 

The universal COI primer set LCO/ HCO [37] failed to yield 
any results under various PCR conditions. Iterative steps of primer 
refinement finally provided a set of six primer-combinations 
(Table 2). Due to high variation at both ends of the DNA 
barcoding fragment our primers are specific mostly at the genus 
level. These primers cover fragments from 660 to 872 bp length. 
Altogether, we were able to generate 89 COI barcodes with the 
unambiguously readable sequence-length ranging from around 
480 to 860 bp. 

Aside from COI, we tested a small set of 28S rDNA primers 
used by [30] . These primers work efficiently in PCR across almost 
the entire taxon sampling. To successfully sequence all specimens, 
the procedure of using a slightly derived forward, as well as 
internal primers proved necessary (Table 1). In total, we generated 
82 sequences of 28S rDNA including the Divergent Domains and 
conserved regions 2 and 3, all of approximately 1000 bp length. 



In COI for each genus the majority of representatives worked 
with the same primer-combination. Only a few specimens 
required a different primer combination, for example some 
representatives of Acerentomon, Ionescuellum and Filientomon 
(Table 2). COI primer combinations proved to be species- specific 
in Ionescuellum, as well as in Acerentomon italkum and A. sp. gr. 
microrhinus (Table 2). 

The original primer set for the 28S rDNA fragment (D2a and 
D3b) worked in all representatives oiAcerella and Ionescuellum. In all 
other genera, additional primer-combinations were required 
without any hint for species-specifity. In Acerentulus, Eosentomon 
and Filientomon, for example, all three possible combinations had to 
be applied to yield results in all representatives (Table 2). 

Species composition based on morphology 

While determination of the specimens at the genus level 
preceded sequencing, species identification was performed after 
the molecular analyses by Julia Shrubovych, albeit without prior 
knowledge of the molecular results. Morphological species 
determination of the NDE vouchers of our three sampling sites 
resulted in 5 genera and 12 species (Table 3). Eight of the 12 
species were sampled only at one site and therefore represent the 
range of variation within a single population. 

Soil samples from the Leopoldsberg yielded an Acerentomon 
species new to science belonging to the doderoi group (species 
description will follow in a separate paper). Ionescuellum carpaticum 
from Leopoldsberg and Eichkogel and Acerentomon italkum from 
Leopoldsberg represent first records for Austria. 

While NDE itself never impeded determination, the preparation 
of whole-mounts in few instances caused wrinkle formation. 
Consequendy a few specimens could be determined only to the 
genus level, and for some specimens determination at the species 
level was judged to be uncertain (abbreviation cfi). The 
investigated specimens include different developmental stages 
(larva II, maturus junior, preimago, and adult). 
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Table 2. Primer-combinations for COI and 28S rDNA amplification. 





Genus 


Primer-Combination COI 


Primer-Combination 28S rDNA 


Acerella 


CamFI + DiplRI (2) 


D2a + D3b (2) 


Acerentomon 


ProtF2 + ProtR2 (37) 


Dla + D3b (24, A. n. sp. gr. doderoi) 




LepFI + DiplRI (1, A. sp. gr. microrhinus) 


D2a + D3b (16) 


CamFI + DiplRI (1, A. italicum) CamFI + CamRI (1) 


Acerentulus 


CamFI + DiplRI (30) 


Dla + D3b (18) 


D2a + D3b (4) 


D2aprot + D3b (3) 


Eosentomon 


LCO + ProtR2 (6) 


D2aprot + D3b (3) 


D2a + D3b (2) 


Dla + D3b (1) 


lonescuellum 


LepFI + Prot R2 (8, /. haybachae) 


D2a + D3b (14) 


CamFI + DiplRI (2, /. carpaticum) CamFI + CamRI (1) 


LCO + ProtR2 (2, /. silvaticum) 


Baculentulus 


LepFI + DiplRI (1) 




Filientomon 


LCO + ProtR2 (2) 


D2a + D3b (1) 




CamFI + CamRI (1) 


Dla + D3b (1) 


D2aprot + D3b (1) 



Numbers of successfully sequenced specimens in parenthesis. Species names in parenthesis indicate that combinations seem to be species-specific. 
doi:1 0.1 371 /journal.pone.0090653.t002 



Comparison between taxonomy and COI distances 

The Neighbor-Joining (NJ) tree of the COI data contains 91 
taxa with an alignment length of 657 bp, and shows 17 maximally 
supported subdivisions (Fig. 1). Remarkably, the analysis retrieved 
all identified genera in monophyletic clusters, and all morpholog- 
ically determined species form monophyletic subclades. COI 
barcodes captured all species boundaries among the 12 species 
from Austria. 

Within the Acerentomon cluster each subdivision reflects a species, 
each collected from a single locality. lonescuellum is represented by 
three species. Two of them, I. carpaticum and I. haybachae, are 
reported from two different sites. Unfortunately, the only outlier of 



/. haybachae could not be determined unambiguously. The third 
major clade coincided with the genus Acerentulus. A. exiguus is the 
only species in which distances are high among individuals of the 
same population, but even there the values are still lower than 
distances between the two investigated populations. The only 
oudier within Eosentomon could not be determined to species level 
due to the low quality of the whole-mount. 

Generally, Kimura-2-Parameter (K2P) distances within popu- 
lations were very low, the only exception being A. exiguus from 
Twimberger Graben. Whenever two populations of the same 
species are covered, intraspecific distances are very high (up to 
21.3% between populations of/, haybachae). Intrageneric distances 



Table 3. List of morphologically determined species of Protura investigated in this study. 



Genus 


Genus (Abbr.) 


Species 


Locality (Abbr.) 


Acerella 


Aca 


muscorum (lonsecu, 1930) 


LB 


Acerentomon 


Aco 


italicum Nosek, 1969 


LB 






maius Berlese, 1908 


TG 






sp. gr. microrhinus Tuxen 


TG 






n. sp. gr. doderoi 


LB 


Acerentulus 


Ace 


exiguus Conde, 1944 


LB, TG 






tuxeni Rusek, 1966 


LB 


Eosentomon 


Eos 


cetium Szeptycki & Christian 2000 


LB, TG 






sp. gr. delicatum 


LB 


lonescuellum 


Ion 


carpaticum (lonescu, 1930) 


LB, EK 






haybachae Nosek, 1967 


LB, EK 






silvaticum Rusek, 1965 


EK 



Genus abbreviation: Ace = Acere!la, Aco = Acerentomon, Acu = Acerentulus, Eos = Eosentomon, Ion = lonescuellum; Locality abbreviation: LB = Leopoldsberg, 
TG = Twimberger Graben, EK = Eichkogel. 
doi:1 0.1 371 /journal.pone.0090653.t003 
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H P090 Aconspgrdoderoi 
HP1Q2_Aco_nsp_gr_doderoi 
HP'\00_Aco_nsp_gr_doderoi 
HP076_Aco_nsp_gr_doctero/ 
HP085_^co_nsp_gr_doc/era; 
H P086 Aconspgrdoderoi 
HP101_4.co_nsp_gr_rJocfero/ 
HP108 Aconspgrdoderoi 
HP'\35_Aco_nsp_gr_doderoi 
l HP077_Aco_nsp_gr_dodera/ 
HP1 3Q_Aco_nsp_gr_doderoi 
HP 1 52 _Aco_nsp_gr_doderoi 
HP"\06_Aco_nsp_gr_doderoi 
HP~\Q3_Aco_nsp_gr_doderoi 
HP1 50 Aco nsp gr doderoi 
HP079_Aco_nsp_gr_doderoi 
HP089_Aco_nsp_gr_doderoi 
HP ~\ 44 Aco nsp gr doderoi 
HP 1 45 Aco nsp gr doderoi 
HP 1 47 Aco nsp gr doderoi 
HP075_<4co_nsp_gr _doderoi 
HP087 _Aco_nsp_gr_doderoi 
HP096_Aco_nsp_gr_doderoi 
■ HP107 _Aco_nsp_gr_doderoi 
H P 1 46 Acq nsp tff doderoi 
HP046_Aco_maius 

• HPQAlAcomaius 

• HP044_/1 co_maius 

• HP045_-4co_ma/us 

• HP014_dco_sp 

• HP013_/*co_sp 

• HP058_-4co_ma/us 

• HP067 _Aco_ct_maius 
M HP052 _Aco_maius 

64 !• HP059_/lco_n7a/"us 

• HP061_^co_sp_gr_m(crortous 
HP^lAcoJtalicum 

A HP025Jon_cf_s//i'a//cum 



HP008_/on_carpa(/civm 
H P0 1 5_lon_carpaticum 

HPWJontfJtaybachae 
A HP021_/on_cf_/?aybachae 
A HP022_/on_/iayf)ac/iae 
H P026_/on_ftayPac/7ae 
HP027_ton_sp 
,AHP017_/on_sp 
L-A HP001 Jon _haybachae 
"a HPQ23_!on_haybachae 
HP033_Bac_densus 

<0 JQ41 1 21 7_Acu_charrieri 

HP078_Acu_exiguus 
HPM8_Acu_cf_exiguus 
H P 1 58-4 cuexiguus 
H P 1 67 ,4 cu exiguus 
HP1 51 _Acu_ex/guus 
H P 1 6 1 _Acu_exiguus 
H P 1 59_-4cu_exfguus 
HP1 29_^cu_cf_ex/gL/L/s 
HP1 25_Acu_exiguus 
HP1 26_-4c(j_ex/guus 
HP1 27 _Acu_exiguus 
I HP123_.4cu_ex«juus 
HP1 24 Acu exiguus 
HP J \64_AcuJuxeni 
HPlZQAcutuxeni 
HP142 Acutuxeni 
HPlGOAcutuxeni 
HP137 _Acu_c\Juxeni 
HP1 31_Acu_cf Juxeni 
HP1 55_<4cu_(uxen/ 
HP166_^cu_(uxen/ 
HP168 Acti fuxeni 
HP171_Acu__cMuxen/ 
HP140_AcuJ[/xen/ 
HP138_^ct/Jt/xem 
HP141_^cu_((;xem 
HP143_-4cu_(uxeni 
HP163_flcuJuxen/ 
HP165 Acu tuxeni 
Hp-\Z4_Acu_c\Juxeni 
84Ta HP170_^cu_iuxem 
'■ HP009 _Ace_muscorum 
I HP062_Ace_muscorum 
-HP028 Fil takanawanum 



Y 



-0 • HP060_£os_sp 

■ ■ HP162_£os_ce(/um 
9 [-■ HP148_Eos_ceft'um 

ljm».U.M UHP149_Eo S _ce(, Um 
50 \-M HP157 Eos celium 

■ ■ HP169_Eos_ceh'um 
— 0 HQ199311_S/n_eof>ran(™ 



Figure 1. NJ tree based on K2P distances from 91 COI 
sequences of Protura. Newly sequenced specimens labeled with 
lab code number (HP), abbreviation for genus, and species name. Color 
code for genera: Acerentomon = violet, lonescuellum = green, Acerentu- 
lus = orange, Acerella = red, Eosentomon = blue; Austrian sample sites are 
coded with different icons: Leopoldsberg = square, Eichkogel = triangle, 
and Twimberger Graben = circle. Bootstrap support (given below 
nodes) derived from 5000 replicates. Maximally supported clusters 
and subclusters are indicated by black dots. Genus abbreviations: 
Aco = Acerentomon, Ion = lonescuellum, Acu = Acerentulus, Ace = Acerella, 
and Eos = Eosentomon. 
doi:10.1371/journal.pone.0090653.g001 

are very high with maximal congeneric distances ranging around 
30%, the highest distance value (44.7%) occurs between I. 
silvaticum and I. haybachae (Table 4). 

Resolution power of COI and 28S rDNA 

The power of the sequenced 28S rDNA fragment to discrim- 
inate the studied specimens of Protura at the species level equals 
that of the COI barcoding fragment. Differences mainly pertain to 
branch length (Fig. 2 and 3). 

Distances between populations are lower in 28S rDNA, albeit a 
proper comparison suffers from the necessity to exclude regions of 
highest variability since they are not unambiguously alignable 
among all investigated Protura. In lonescuellum haybachae the 
maximum distance between populations from Eichkogel and 
Leopoldsberg are 3.1% (compared to 21.3% in COI), and between 
Eosentomon cetium from the Leopoldsberg and E. sp. from the 
Twimberger Graben 7.5% (15.6% in COI). Topological differ- 
ences are low, but present within the genus lonescuellum and 
profound in the placement of Filientomon takanawanum. In the NJ- 
tree based on 28S rDNA, I. carpaticum clusters with /. silvaticum 
(maximally supported), but with I. haybachae in the COI-tree (for 
complete NJ tree of 28S rDNA see Figure SI). 

Distances of 28S rDNA sequences likewise are shorter within 
Acerentomon, Acerentulus and Acerella (see Tables S3, S4), as reflected 
in shorter internal branches in the tree. Important differences are 
evident in Acerentulus exiguus (see Table S2). Distances are very high 
in COI and the population of Twimberger Graben is further 
subdivided into two subclusters. In contrast 28S rDNA shows no 
distances within this population. The maximum distance between 
the populations of Twimberger Graben and Leopoldsberg is about 
seven times as high in COI (17.5%) as in 28S rDNA (2.5%). Some 
differences are apparent in the topology of the genus Acerentomon. 
In COI, a clade formed by A. sp. gr. microrhinus and A. italicum 
clusters with a clade containing A. maius and A. n.sp. gr. doderoi, 
while 28S rDNA retrieves A. n. sp. gr. doderoi as nearest neighbor to 
a clade comprising all remaining species of the genus and 
Filientomon takanawanum (see also Table S3). Bootstrap support for 
the respective nodes is low in both trees, and the respective 
internal branches are very short. 

Discussion 

Morphological taxonomy is accurately reflected in the 
molecular data 

Morphological determination of Protura is extremely challeng- 
ing and a skill that is restricted to a handful of taxonomists 
worldwide [4]. Identification at the species level strongly relies on 
subtle chaetotaxic characters, where the position and length ratio 
especially of certain foretarsal bristles play a crucial role [7] , [38- 
39]. Both intraspecific variability and anomalies may further 
hamper identification [8], [40-42]. Furthermore it should be 
noted, that the sexual biology and mode of sperm transfer of 
Protura is still enigmatic [4] and we consequendy have no 
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Table 4. K2P distances in COI among species of Protura collected in Austria. 



Species (Abbr.) 


Max intra 


Best intra 


Max congen 


Best match 


Best inter 


Aca muscorum 


0 


0 




Acu exiguus 


31.85-32.3 


Aco itolicum 






33.3 


Aco n. sp. gr. doderoi 


31.2 


Aco maius 


0.3 


0-0.3 


33.3 


Aco n. sp. gr. doderoi 


28.0-28.5 


Aco n. sp. gr. doderoi 


0.6 


0-0.2 


33.0 


Aco maius 


28.0-28.8 


Aco sp. gr. microrhinus 






33.3 


Aco maius 


31.3 


Acu exiguus 


17.5 


0-0.64 


24.5 


Acu tuxeni 


20.5-23.6 


Acu tuxeni 


0.2 


0-0.2 


24.5 


Acu exiguus 


20.5-22.1 


Eos cetium 


1.8 


0.31-0.84 


15.6 


Eos sp. 


14.5-15.6 


Eos sp. 






25.6 


Eos cetium 


14.2 


Ion hoybachae 


21.3 


0-20.63 


44.7 


ton carpaticum 


33.3-34.5 


Ion carpaticum 


6.3 


6.25 


41.2 


ton haybachae & Eos sp. 


33.6 & 32.3 


Ion silvaticum 






44.7 


Acu exiguus 


28.3 



Species described by their maximal intraspecific (Max intra}, and the range of the best intraspecific matches (Best intra), as well as the maximal congeneric interspecific 
distances (Max congen), the name of the best matching species (Best match), along with the range of their best interspecific distances (Best inter). In species, covered by 
a single representative, missing distances are marked by a short line. Species abbreviations as used in Tab. 3. 
doi:1 0.1 371 /journal.pone.0090653.t004 



possibility to check morphospecies from a biospecies concept 
perspective. Therefore, despite the great efforts of excellent 
taxonomists (see the critical catalogue of the Protura of the world 
by [39]), reliability of some morphospecies remains controversial; 
thus support from molecular data is desirable. 

Despite suspected discrepancies, the clustering of our molecular 
data is exactly mirrored in the morphological determination. It 
should be mentioned that the species determination was 
performed without prior knowledge of the results from the 
molecular data. This clearly suggests that the morphospecies 
described by traditional taxonomy have a solid biological 
background and probably represent true biospecies. The seem- 
ingly marginal diagnostic characters used by traditional taxono- 
mists obviously suffice for appropriate species identification of 
Protura. Our sequence data enables distinct species delimitation, 



which is supported by high interspecific distances. Furthermore, 
even different developmental stages and specimens, in which the 
quality of whole-mounts is not sufficient for species determination, 
can be appropriately allocated. Barcoding thus will foster further 
studies on morphological changes during postembryonic develop- 
ment. 

Extraordinarily high genetic distances within proturan 
barcodes 

Our study revealed huge genetic distances among proturans, for 
both the maximal intraspecific (0-21.3%), as well as maximal 
congeneric interspecific distances (up to 44.7%) (Table 4, Tables 
S2, S3, S4, S5, and S6). Unique among published animal "record 
holders" of intraspecific variation, in Protura COI distances 
accurately allow to differentiate among all unambiguously 



lonescuellum 



A 
A 
A 
▲ 
▲ 
▲ 
▲ 



| ton haybachae \ 



Ion ca rpaticum \ 
-A | ton cf silvaticum \ A- 



lonescuellum 



\Eos sp.| 



\Eoscetium\ 



Eosentomon 



0.1 



Figure 2. Comparison of COI and 28S rDNA in species discrimination of the genera lonescuellum {Ion) and Eosentomon [Eos). NJ tree 
based on K2P distances of COI (left) and the mirrored 28S rDNA results (right). Bootstrap support (maximal support marked with full circles) derived 
from 5000 replicates is maximal for all species and populations. Color code for genera: Acerentomon = violet, lonescuellum = green, 
Acerentulus = orange, Acerella = red, Eosentomon = blue; Austrian sample sites are coded with different icons: Leopoldsberg = square, Eichkogel = 
triangle, and Twimberger Graben = circle. 
doi:1 0.1 371 /journal.pone.0090653.g002 
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Aco n. sp. 
gr. doderoi 




0.05 



Acerella 



Acerella 



0.05 



Figure 3. Comparison of COI and 28S rDNA in species discrimination of the genera Acerentomon (Aco), Acerentulus (Acu) and Acerella 
(Ace). Mirrored NJ tree based on K2P distances of COI (left) and 28S (right). Bootstrap support (maximal support marked with full circles) derived from 
5000 replicates. Color code for genera: Acerentomon = violet, lonescuellum = green, Acerentulus = orange, Acerella = red, Eosentomon = blue; Austrian 
sample sites are coded with different icons: Leopoldsberg = square, Eichkogel = triangle, and Twimberger Graben = circle. 
doi:1 0.1 371 /journal.pone.0090653.g003 



determined species. This resolution power contrasts with most 
other groups, where saturation of COI sequences seems to prevent 
proper delimitation [43-46]. In Protura not only all morphospe- 
cies, but all 5 genera, as well as the three major lineages 
Sinentomata, Eosentomata, and Acerentomata [39], [47] were 
retrieved in the NJ-tree. 

Testing our sequence data for sequence saturation, transver- 
sions are shown to be more common than transitions. Thus 
saturation is clearly present in COI sequences of Protura (Figure 
SI), like expected in ancient phylogenetic lineages, such as 
Protura. The retained resolution power may be due to (i) a low 
speciation rate within Protura, (ii) the low taxon sampling, or a 
combination of these effects. The high intraspecific distances 
among populations may indicate the presence of cryptic species. 
However, we refrain from premature species splitting, since the 
low dispersal capabilities may sufficiently explain the observed 
pattern. 

A closer look on the dispersal capability of Protura 

Protura belong to the primarily wingless hexapods, which 
mostly show restricted dispersal capability compared to pterygote 
insects. Additional factors limiting proturan dispersal are their 
minute body size together with a slow mode of locomotion, as well 



as their euedaphic life style, which restricts them to deeper soil 
layers [48-49]. 

Passive dispersal by wind and water is known from other soil 
arthropods (Collembola: [50-51], Archaeognatha: [52], mites: 
[51]). Since Protura are strictly euedaphic, wind can be excluded 
as a possible dispersal medium. Floating in water is more 
conceivable, since many soil dwelling organisms are tolerant to 
hypoxia or anoxia [53], and pore space can be abruptly filled with 
water due to heavy rain or inundation. Several studies revealed 
that proturans can survive and show signs of active movement 
under water up to seven days [54], [Pomorski, personal comm.] 
and occur in soil habitats subjected to regular inundation [55]. 
Thus, if soil is washed out through heavy rain, specimens of 
Protura may be spread passively and thus be able to conquer new 
habitats. 

Misleading results when setting universal discrimination 
thresholds 

Different operational criteria were proposed to permit species 
delimitation with molecular data including; (i) reciprocal mono- 
phyly [56], (ii) a barcoding gap [57-58], and (iii) absence of 
interlineage reproduction. 

Reciprocal monophyly of proturan lineages is matched by the 
current data, since all species and genera cluster in monophyletic 
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associations. [18] and [57] suggested interspecific distances lOxthe 
average of intraspecific distances as a threshold for a barcoding 
gap. This would yield unrealistic to impossible values in Protura 
(up to 213%). Aside from this, our results show high intraspecific 
distances compared to moderate interspecific distances, as 
reported from other animals [45], [59-60]. Our highest maximal 
intraspecific distances {Ionescuellum haybachae 21.3%) exceeds 
smallest interspecific, congeneric distances (Eosentomon 14.5%) 
(Table 4). Such an overlap makes it difficult to set distance 
thresholds valid across the entire taxon sampling, and species 
delimitation solely dependent on DNA barcoding then becomes 
less effective (see also Tables S7 and S8). Furthermore, higher 
ranges of overlap must be expected once more closely related taxa 
are included. Thus, setting a general cut-off across clades becomes 
problematic and can lead to substantial errors in species 
identification, not only for Protura. For example, due to high 
sequence divergence among Protura it is not possible to decide 
unambiguously, whether the specimen of Eosentomon, in which the 
quality of the whole-mount did not allow for determination to 
species level, is conspecific with Eosentomon cetium. The observed 
distance may both represent intraspecific variation among distinct 
populations, or a species boundary. In the latter case, it would 
represent the lowest congeneric distance of COI within Protura. 

Many authors explain high interspecific distances as an artifact 
of incomplete representation of the distributional range of a 
species (underestimating intraspecific distances), or as the failure to 
sample sister taxa (overestimating interspecific distances) [59], 
[61]. Given the limited taxon sampling of our study, both 
explanations have to be taken into consideration. In only four of 
the investigated species populations of different sampling sites are 
represented, and all of them are restricted to sample sites in 
Austria. All species, in which two populations are covered, 
revealed exceptionally high intraspecific distances. Therefore, we 
expect similarly high distances within other proturan species. 

Searching for sister taxa will cause potential difficulties since 
phylogenetic relationships within the morphologically defined 
"species-groups" are usually unclear. One group of our taxon 
sampling may partially fulfill the demand of dense coverage 
among closely related species: It comprises the three species of the 
genus Acerentomon {A. mains, A. italicum, A. n. sp. gr. doderoi) which are 
representatives of the "doderoi-group". On the one hand the 
maximal interspecific distance among these three species range 
from 31.9% to 33.3% and thus lies within the range of the 
maximal distance to A. sp. gr. microrhinus, which is the sole 
representative of the "microrhinus-group" (32.9% to 33.3%). On the 
other hand, the two species of Acerentulus have the lowest 
interspecific distance of our complete taxon sampling, although 
they represent different species-groups. 

Beyond that, covering the entire geographical range of a species 
is often hampered especially in this hexapod group by the 
incomplete knowledge on its distribution. Research on proturan 
species distribution reveals huge gaps even within the relatively 
well investigated European areas (www.faunaeur.org). These gaps 
must be attributed to the underrepresentation of proturan 
research, in general, and particularly in broad ecological studies, 
but also to a lack of taxonomists working on Protura. 

Advantages of additional markers to support COI in 
challenging groups 

The approach of DNA barcoding with only COI as standard 
marker and the chosen length of this fragment have been 
exhaustively debated (for reviews see [3], [62]). 

To overcome these limitations we lengthened most sequences of 
COI by approximately 900 bp and used a fragment of 28S rRNA 



as a supplementary marker. The extension of the DNA barcoding 
fragment increases the phylogenetic signal and minimizes random 
variation in sequence divergence estimation. The use of an 
additional marker is highly recommendable since it represents an 
independent data set which allows for testing results obtained 
through DNA barcoding by COI [42-43], [63-64]. 

28S rRNA was chosen for several reasons. This gene is known 
to be built up of alternating highly conserved regions and variable 
Divergent Domains [65] . Therefore it provides conserved priming 
sites to design universal primers [65], as well as variability in 
primary sequence and length to delimitate between closely related 
taxa [66]. Furthermore, 28S rRNA as a nuclear gene has several 
advantages over an additional mitochondrial marker. The 
mitochondrial ribosomal DNA is generally assumed to evolve 
more rapidly than the nuclear genes [65]. This is of special 
importance for evolutionary ancient taxa, such as Protura, where 
saturation effects may become problematic. Finally, primers are 
well established in 28S rDNA, also for Protura. Despite the overall 
lack of public proturan sequences, much data is available 
regarding nuclear 18S rDNA and 28S rDNA [30], [67-69]. 
While 18S rDNA was previously used to resolve higher-level 
phylogenetic relationships within arthropods [65], [70], fragments 
of the 28S rDNA achieved more appropriate separation at the 
genus and species levels in Protura, Collembola and Diplura ([67] 
D3-D5, [69] Dl-Dll). 

Our results demonstrate the high potential of the Divergent 
Domains 2 and 3 of 28S rDNA to accurately separate all 
investigated proturans to species and genus levels. As an advantage 
the presence of both conserved and variable regions leaves the 28S 
rDNA potentially informative not only for recent splits, but 
likewise for deeper nodes. Known length variation within the 
Divergent Domains [71], making sequence alignment a tricky and 
time consuming task, does not seem to impede the use of the 
fragment encompassing D2 and D3 in comparisons within 
Protura, but leaves comparisons of distance measures between 
the two genes problematic. 

Conclusions and future perspectives 

Our study impressively shows that the DNA barcoding 
approach with the standardized COI marker region is applicable 
to accurately identify proturans at the species level. The high 
variation of COI at both primer sites demands the use of several 
primers to properly amplify the DNA barcoding fragment from all 
proturan species. One possibility to increase PCR, as well as 
sequencing success, would be the use of primer cocktails as 
implemented in [72]. 

Furthermore, our project revealed high intra- and interspecific 
distances within the taxon sampling. Due to this high variation of 
sequence divergence, an interspecific threshold is not yet 
applicable for Protura. 

In our study we used an additional marker to investigate the 
molecular diversity of Protura, and both markers remarkably 
demonstrated the integrity of traditional morphology. 

For future studies in Protura, we highly endorse the use of 
alternative markers, e.g. 28S rDNA, which are even more 
conserved and reliable for evolutionary ancient taxa. Especially 
in deep-rooted genetic lineages such as the primarily wingless 
hexapods more exhaustive taxon sampling will introduce problems 
of high genetic variation due to saturation of COI. Signals from 
additional markers, can provide independent support for species 
delimitation obtained through DNA barcoding. Otherwise, 
misinterpretation of results can lead to an overestimation of 
species richness or an underestimation of intraspecific variability. 
Additionally, restricted taxon sampling may lead to an underrep- 
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resentation of the complete genetic range of a species, as well as 
overlooking of sister species. We are aware that our taxon 
sampling is limited in terms of geographical distribution and the 
number of analyzed species. Nevertheless, we are confident that 
this pilot study will initialize a new avenue of research to improve 
and facilitate species delimitation and identification in Protura. 

Supporting Information 

Figure SI Complete NJ tree based on K2P distances 
from 84 28S rDNA sequences (fragments D2-D3) of 
Protura. Newly sequenced specimens labeled with lab code 
number (HP), abbreviation for genus, and species name. Color 
code for genera: Acerentomon = violet, Ionescuellum = green, Acerentu- 
lus = orange, Acerella — red, Eosentomon = blue; Austrian sample sites 
are coded with different icons: Leopoldsberg = square, Eichko- 
gel = triangle, and Twimberger Graben = circle. Bootstrap support 
(given below nodes) derived from 5000 replicates. Genus 
abbreviations: Aco — Acerentomon, Ion = Ionescuellum, Acu = Acerentulus, 
Ace = Acerella, and Eos = Eosentomon. 
(PDF) 

Figure S2 DAMBE substitution saturation plot for COI 
sequences of Protura. The number of transitions (s) and 
transversions (v) is plotted against the K2P ( = K80) distance. The 
higher frequency of transversions compared to frequency of 
transitions clearly indicates saturation effects in our COI data set. 
(PDF) 

Table SI Species list of studied proturans, with Indi- 
vidual IDs, developmental stage, sampling location, 
used primer pair and Accession numbers given for each 
individual. 

(XLSX) 

Table S2 Maximal intraspecific K2P distances of COI 
and 28S rDNA sequences of investigated Protura. Note 
that a proper comparison suffers from the necessity to exclude 
regions of highest variability with Aliscore since they are not 
unambiguously alignable among all investigated Protura. Calcu- 
lated with Speciesldentifier 1.7.8. 
(XLSX) 

Table S3 Maximal interspecific, congeneric K2P dis- 
tances of COI and 28S rDNA sequences of investigated 
Protura. Note that a proper comparison suffers from the 
necessity to exclude regions of highest variability with Aliscore 
since they are not unambiguously alignable among all investigated 
Protura. Calculated with Speciesldentifier 1.7.8. 
(XLSX) 

Table S4 Smallest and mean interspecific, congeneric 
K2P distances of COI and 28S rDNA sequences in 
investigated Protura. Note that a proper comparison suffers 
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